A Full English Sentence Database for Off-line Handwriting Recognition

نویسندگان

  • Urs-Viktor Marti
  • Horst Bunke
چکیده

In this paper we present a new database for off-line handwriting recognition, together with a few preprocessing and text segmentation procedures. The database is based on the Lancaster-Oslo/Bergen(LOB) corpus. This corpus is a collection of texts that were used to generate forms, which subsequently were filled out by persons with their handwriting. Up to now (December 1998) the database includes 556 forms produced by approximately 250 different writers. The database consists of full English sentences. It can serve as a basis for a variety of handwriting recognition tasks. The main focus, however, is on recognition techniques that use linguistic knowledge beyond the lexicon level. This knowledge can be automatically derived from the corpus or it can be supplied from external sources.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

Using an artificial neural network approach for off-line sentence segmentation

This paper works with an Artificial Neural Network (ANN) architecture to segment unconstrained English handwriting sentences into single words. The ANN receives a feature set of the handwritten text line and classifies each image’s column belonging to a word or a gap between words. As result, the sequences of columns with the same classification represent the segmented words or inter-word gaps....

متن کامل

Strategies for Combining On-line and Off-line Information in an On-line Handwriting Recognition System

This paper investigates the cooperation of on-line and off-line handwriting word recognition systems. Our goal is to improve a mature on-line recognition system by exploiting the complementary information present in the off-line representation built from on-line signal. After describing the on-line and off-line HMM based handwriting recognition systems, we propose a formal framework, which allo...

متن کامل

A Novel Comprehensive Database for Arabic Off-Line Handwriting Recognition

This paper presents the work toward developing a new comprehensive database for Arabic off-line handwriting recognition. The database includes: isolated Indian digits, numerical strings, Arabic isolated letters, and a collection of 70 Arabic words. Also, the database includes a free format sample of an Arabic date. A data entry form was designed to collect written samples from Arabic native spe...

متن کامل

English Sentence Recognition using Artificial Neural Network through Mouse-based Gestures

Handwriting is one of the most important means of daily communication. Although the problem of handwriting recognition has been considered for more than 60 years there are still many open issues, especially in the task of unconstrained handwritten sentence recognition. This paper focuses on the automatic system that recognizes continuous English sentence through a mouse-based gestures in real-t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999